310 research outputs found

    Family- and population-based designs identify different rare causal variants

    Get PDF
    Both family- and population-based samples are used to identify genetic variants associated with phenotypes. Each strategy has demonstrated advantages, but their ability to identify rare variants and genes containing rare variants is unclear. To compare these two study designs in the identification of rare causal variants, we applied various methods to the population- and family-based data simulated by the Genetic Analysis Workshop 17 with knowledge of the simulated model. Our results suggest that different variants can be identified by different study designs. Family-based and population-based study designs can be complementary in the identification of rare causal variants and should be considered in future studies

    A Robust Statistical Method for Association-Based eQTL Analysis

    Get PDF
    Background: It has been well established that theoretical kernel for recently surging genome-wide association study (GWAS) is statistical inference of linkage disequilibrium (LD) between a tested genetic marker and a putative locus affecting a disease trait. However, LD analysis is vulnerable to several confounding factors of which population stratification is the most prominent. Whilst many methods have been proposed to correct for the influence either through predicting the structure parameters or correcting inflation in the test statistic due to the stratification, these may not be feasible or may impose further statistical problems in practical implementation. Methodology: We propose here a novel statistical method to control spurious LD in GWAS from population structure by incorporating a control marker into testing for significance of genetic association of a polymorphic marker with phenotypic variation of a complex trait. The method avoids the need of structure prediction which may be infeasible or inadequate in practice and accounts properly for a varying effect of population stratification on different regions of the genome under study. Utility and statistical properties of the new method were tested through an intensive computer simulation study and an association-based genome-wide mapping of expression quantitative trait loci in genetically divergent human populations. Results/Conclusions: The analyses show that the new method confers an improved statistical power for detecting genuin

    The NEI/NCBI dbGAP database: Genotypes and haplotypes that may specifically predispose to risk of neovascular age-related macular degeneration

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>To examine if the significantly associated SNPs derived from the genome wide allelic association study on the AREDS cohort at the NEI (dbGAP) specifically confer risk for neovascular age-related macular degeneration (AMD). We ascertained 134 unrelated patients with AMD who had one sibling with an AREDS classification 1 or less and was past the age at which the affected sibling was diagnosed (268 subjects). Genotyping was performed by both direct sequencing and Sequenom iPLEX system technology. Single SNP analyses were conducted with McNemar's Test (both 2 Γ— 2 and 3 Γ— 3 tests) and likelihood ratio tests (LRT). Conditional logistic regression was used to determine significant gene-gene interactions. LRT was used to determine the best fit for each genotypic model tested (additive, dominant or recessive).</p> <p>Results</p> <p>Before release of individual data, <it>p</it>-value information was obtained directly from the AREDS dbGAP website. Of the 35 variants with <it>P </it>< 10<sup>-6 </sup>examined, 23 significantly modified risk of neovascular AMD. Many variants located in tandem on 1q32-q22 including those in <it>CFH</it>, <it>CFHR4</it>, <it>CFHR2</it>, <it>CFHR5</it>, <it>F13B</it>, <it>ASPM </it>and <it>ZBTB </it>were significantly associated with AMD risk. Of these variants, single SNP analysis revealed that <it>CFH </it>rs572515 was the most significantly associated with AMD risk (P < 10<sup>-6</sup>). Haplotype analysis supported our findings of single SNP association, demonstrating that the most significant haplotype, GATAGTTCTC, spanning <it>CFH</it>, <it>CFHR4</it>, and <it>CFHR2 </it>was associated with the greatest risk of developing neovascular AMD (<it>P </it>< 10<sup>-6</sup>). Other than variants on 1q32-q22, only two SNPs, rs9288410 (<it>MAP2</it>) on 2q34-q35 and rs2014307 (<it>PLEKHA1</it>/<it>HTRA1</it>) on 10q26 were significantly associated with AMD status (<it>P </it>= .03 and <it>P </it>< 10<sup>-6 </sup>respectively). After controlling for smoking history, gender and age, the most significant gene-gene interaction appears to be between rs10801575 (<it>CFH</it>) and rs2014307 (<it>PLEKHA1</it>/<it>HTRA1</it>) (<it>P </it>< 10<sup>-11</sup>). The best genotypic fit for rs10801575 and rs2014307 was an additive model based on LRT. After applying a Bonferonni correction, no other significant interactions were identified between any other SNPs.</p> <p>Conclusion</p> <p>This is the first replication study on the NEI dbGAP SNPs, demonstrating that alleles on 1q, 2q and 10q may predispose an individual to AMD.</p

    Three Ways of Combining Genotyping and Resequencing in Case-Control Association Studies

    Get PDF
    We describe three statistical results that we have found to be useful in case-control genetic association testing. All three involve combining the discovery of novel genetic variants, usually by sequencing, with genotyping methods that recognize previously discovered variants. We first consider expanding the list of known variants by concentrating variant-discovery in cases. Although the naive inclusion of cases-only sequencing data would create a bias, we show that some sequencing data may be retained, even if controls are not sequenced. Furthermore, for alleles of intermediate frequency, cases-only sequencing with bias-correction entails little if any loss of power, compared to dividing the same sequencing effort among cases and controls. Secondly, we investigate more strongly focused variant discovery to obtain a greater enrichment for disease-related variants. We show how case status, family history, and marker sharing enrich the discovery set by increments that are multiplicative with penetrance, enabling the preferential discovery of high-penetrance variants. A third result applies when sequencing is the primary means of counting alleles in both cases and controls, but a supplementary pooled genotyping sample is used to identify the variants that are very rare. We show that this raises no validity issues, and we evaluate a less expensive and more adaptive approach to judging rarity, based on group-specific variants. We demonstrate the important and unusual caveat that this method requires equal sample sizes for validity. These three results can be used to more efficiently detect the association of rare genetic variants with disease

    Accuracy of Predicting the Genetic Risk of Disease Using a Genome-Wide Approach

    Get PDF
    Background - The prediction of the genetic disease risk of an individual is a powerful public health tool. While predicting risk has been successful in diseases which follow simple Mendelian inheritance, it has proven challenging in complex diseases for which a large number of loci contribute to the genetic variance. The large numbers of single nucleotide polymorphisms now available provide new opportunities for predicting genetic risk of complex diseases with high accuracy. Methodology/Principal Findings - We have derived simple deterministic formulae to predict the accuracy of predicted genetic risk from population or case control studies using a genome-wide approach and assuming a dichotomous disease phenotype with an underlying continuous liability. We show that the prediction equations are special cases of the more general problem of predicting the accuracy of estimates of genetic values of a continuous phenotype. Our predictive equations are responsive to all parameters that affect accuracy and they are independent of allele frequency and effect distributions. Deterministic prediction errors when tested by simulation were generally small. The common link among the expressions for accuracy is that they are best summarized as the product of the ratio of number of phenotypic records per number of risk loci and the observed heritability. Conclusions/Significance - This study advances the understanding of the relative power of case control and population studies of disease. The predictions represent an upper bound of accuracy which may be achievable with improved effect estimation methods. The formulae derived will help researchers determine an appropriate sample size to attain a certain accuracy when predicting genetic ris

    AWclust: point-and-click software for non-parametric population structure analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Population structure analysis is important to genetic association studies and evolutionary investigations. Parametric approaches, e.g. STRUCTURE and L-POP, usually assume Hardy-Weinberg equilibrium (HWE) and linkage equilibrium among loci in sample population individuals. However, the assumptions may not hold and allele frequency estimation may not be accurate in some data sets. The improved version of STRUCTURE (version 2.1) can incorporate linkage information among loci but is still sensitive to high background linkage disequilibrium. Nowadays, large-scale single nucleotide polymorphisms (SNPs) are becoming popular in genetic studies. Therefore, it is imperative to have software that makes full use of these genetic data to generate inference even when model assumptions do not hold or allele frequency estimation suffers from high variation.</p> <p>Results</p> <p>We have developed point-and-click software for non-parametric population structure analysis distributed as an R package. The software takes advantage of the large number of SNPs available to categorize individuals into ethnically similar clusters and it does not require assumptions about population models. Nor does it estimate allele frequencies. Moreover, this software can also infer the optimal number of populations.</p> <p>Conclusion</p> <p>Our software tool employs non-parametric approaches to assign individuals to clusters using SNPs. It provides efficient computation and an intuitive way for researchers to explore ethnic relationships among individuals. It can be complementary to parametric approaches in population structure analysis.</p

    A random forest approach to the detection of epistatic interactions in case-control studies

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The key roles of epistatic interactions between multiple genetic variants in the pathogenesis of complex diseases notwithstanding, the detection of such interactions remains a great challenge in genome-wide association studies. Although some existing multi-locus approaches have shown their successes in small-scale case-control data, the "combination explosion" course prohibits their applications to genome-wide analysis. It is therefore indispensable to develop new methods that are able to reduce the search space for epistatic interactions from an astronomic number of all possible combinations of genetic variants to a manageable set of candidates.</p> <p>Results</p> <p>We studied case-control data from the viewpoint of binary classification. More precisely, we treated single nucleotide polymorphism (SNP) markers as categorical features and adopted the random forest to discriminate cases against controls. On the basis of the gini importance given by the random forest, we designed a sliding window sequential forward feature selection (SWSFS) algorithm to select a small set of candidate SNPs that could minimize the classification error and then statistically tested up to three-way interactions of the candidates. We compared this approach with three existing methods on three simulated disease models and showed that our approach is comparable to, sometimes more powerful than, the other methods. We applied our approach to a genome-wide case-control dataset for Age-related Macular Degeneration (AMD) and successfully identified two SNPs that were reported to be associated with this disease.</p> <p>Conclusion</p> <p>Besides existing pure statistical approaches, we demonstrated the feasibility of incorporating machine learning methods into genome-wide case-control studies. The gini importance offers yet another measure for the associations between SNPs and complex diseases, thereby complementing existing statistical measures to facilitate the identification of epistatic interactions and the understanding of epistasis in the pathogenesis of complex diseases.</p
    • …
    corecore